Lead ClickHouse audit sort key with tenant + LowCardinality columns#126
Merged
Conversation
32ed906 to
35c9cc5
Compare
Greptile Summary
Confidence Score: 4/5The schema changes are focused, but the new live test can reject a valid tenant-leading ClickHouse sort key format. The main implementation intent is clear and narrowly scoped, with attention needed on the assertion logic for ClickHouse's reported sorting key representation. tests/Audit/Adapter/ClickHouseTest.php
What T-Rex did
Reviews (4): Last reviewed commit: "Lead ClickHouse audit sort key with tena..." | Re-trigger Greptile |
4e24c39 to
27f1dc4
Compare
3a5ee07 to
b77d2fb
Compare
New shared audit tables are created natively with ORDER BY (tenant, time, id) plus allow_nullable_key, so tenant-scoped time-range reads prune by tenant first. Single-tenant tables keep the historical (time, id) key. event, actorType, resourceType and country are now LowCardinality columns for smaller storage and faster scans (country wrapped as LowCardinality(Nullable)). The full standard secondary-index set is retained. Migrating a pre-existing (time, id) table (adding/materializing the projection, converting columns to LowCardinality, dropping redundant indexes) is an out-of-band operator step, NOT part of the library. The tenant-leading sort key is covered by a live integration test that runs setup() against the ClickHouse service and asserts system.tables.sorting_key leads with tenant.
b77d2fb to
3f2c1ce
Compare
Comment on lines
+991
to
+994
| $this->assertTrue( | ||
| str_starts_with(trim($sortingKey), 'tenant'), | ||
| "Expected sorting key to lead with 'tenant', got: {$sortingKey}" | ||
| ); |
There was a problem hiding this comment.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Leads the ClickHouse audit table sort key with
tenantand stores genuinelylow-cardinality columns as
LowCardinality.New shared audit tables are now created natively with:
ORDER BY (tenant, time, id)plusallow_nullable_key = 1, so tenant-scopedtime-range reads prune by tenant first. Single-tenant (non-shared) tables keep
the historical
(time, id)key.event,actorType,resourceTypeasLowCardinality(String)andcountryas
LowCardinality(Nullable(String))— smaller storage, faster GROUP BY /equality scans. Reads and writes are unchanged.
Scope
This PR is fresh schema only. Because new tables are created with the
tenant-leading key natively, they never need a projection.
Migrating a pre-existing
(time, id)table — adding and materializing thep_tenant_timeprojection, converting columns toLowCardinality, and droppingthe now-redundant bloom indexes — is an out-of-band operator step, run
externally. It is intentionally not part of the library, so
setup()stayssafe to call on every boot.
Tests
A live integration test (
testSharedTableSortKeyLeadsWithTenant) runssetup()against the ClickHouse service in shared-tables mode under a unique namespace,
introspects
system.tables.sorting_key, and asserts the key leads withtenant. It cleans up its table afterwards so it is repeatable.